Document Decomposition for XML Compression: A Heuristic Approach

نویسنده

  • Byron Choi
چکیده

Sharing of common subtrees has been reported useful not only for XML compression but also for main-memory XML query processing. This method compresses subtrees only when they exhibit identical structure. Even slight irregularities among subtrees dramatically reduce the performance of compression algorithms of this kind. Furthermore, when XML documents are large, the chance of having large number of identical subtrees is inherently low. In this paper, we proposed a method of decomposing XML documents for better compression. We proposed a heuristic method of locating minor irregularities in XML documents. The irregularities are then projected out from the original XML document. We refered this process to as document decomposition. We demonstrated that better compression can be achieved by compressing the decomposed documents separately. Experimental results demonstrated that the compressed skeletons, for all real-world datasets, to our knowledge, fit comfortably into main memory of commodity computers nowadays. Preliminary results on querying compressed skeletons validate the effectiveness our approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Apply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML

As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...

متن کامل

Apply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML

As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...

متن کامل

Hcmx: an Efficient Hybrid Clustering Approach for Multi-version Xml Documents

In order to retrieve useful information from large number of growing XML documents on the web, effective management of XML document is essential. One solution is to cluster XML documents to find knowledge that promote effective information management and maintenance. But in the real world XML documents are dynamic in nature. In contrast to static XML documents, changes from one version of XML d...

متن کامل

A New Prototype of XML Compression Technique

XML makes data flexible in representation and easily portable on the Web but it also substantially inflates data size as a consequence of using tags to describe data. Although many effective XML compressors, such as XMill, have been recently proposed to solve this data inflation problem, they do not address the problem of running queries on compressed XML data. More recently, some compressors h...

متن کامل

An MPEG-7 tool for compression and streaming of XML data

In the course of the work on the MPEG-7 standard a binary format with special features for the encoding of XML data was required. These required key features are a high data compression ratio, provision for streaming, dynamic update of the document structure and fast random access of data entities in the compressed stream. To support these features we propose a novel, schema-aware approach whic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006